The data in this task contains information on detection of two types of mosquitoes, Aedes aegypti and Aedes albopictus in the world. A few of the variables that was collected in the data are location and detection time.
In this task the detection of both types of mosquitoes are presented for the years 2004 and 2013. Figure 1 presents the data collected year 2004 and figure 2 the data collected year 2013.
Figure 1: Detected mosquitoes of the types Aedes aegypti and Aedes albopictus in the world during year 2004
Figure 2: Detected mosquitoes of the types Aedes aegypti and Aedes albopictus in the world during year 2013
In figure 1, there are a few countries that detected high densities
of mosquitos, these are: United States, Brazil, Venezuela and
Taiwan.
In United Stated, aegypti mosquitoes were detected in the southern area
and albopictus mosquitoes in the south-eastern area.
In Brazil the mosquitoes were detected in the east and south, a few of
the type albopictus and the rest of the type aegypti.
In Venezuela mosquitoes were detected in the northern areas, all of the
type aegypti.
In Taiwan mosquitoes were detected in the southern areas, all of the
type albopictus.
In figure 2, there are a two countries that detected higher densities
of mosquitoes compared to figure 1, these are: Brazil and Taiwan.
With a zoomed out picture of Brazil, it appears that the mosquitoes
could be detected all over the country, but with a lower density in the
north-western area. Only the type aegypti were detected.
With a zoomed out picture of Taiwan, it appears that only the central
and eastern part of the country did not detect any mosquitoes, the rest
of the country detected the type aegypti. A closer inspection reveals
that mostly the south-western part of the country had a high density of
mosquitoes and the rest of the country the density was lower.
For the rest of the world it appears that the number of detected
mosquitoes has decreased.
In figure 2 a perception problem of overplotting was highlighted for Brazil and Taiwan. The detection of mosquitoes are represented as dots and the size of the dots on the screen remains the same size whether the figure is zoomed in or out. This particular is a problem when inspecting a zoomed out figure, where the dots overlap, leading to the illusion that almost the whole country have detected mosquitoes for the year 2013.
The number of detected mosquitoes per country during all study period have been calculated as \(Z\). Figure 3 presents the value of \(Z\) for all countries in the world.
Figure 3: Number of mosquitos detected in each country during all study period, the map have an equirectangular projection.
In figure 3 the three countries with the most detected cases were Taiwan with 24837 cases, Brazil with 8501 cases and United States with 2044 cases. All other countries have significant lower detected cases than Taiwan. Mapping frequencies of detected mosquitoes to colour will therefore lead to the colour of all other countries to be similar. The map contains little information since it appears that almost all countries have the same colour.
In this task the computed value \(log(Z)\) is used (\(Z\) were computed from task 1.2).
Figure 4 presents an equirectangular projection with mapping of colour to \(log(Z)\). Figure 5 presents a conic equal area projection with mapping of colour to \(log(Z)\).
Figure 4: Logarithmised number of mosquitos detected in each country during all study period, the map have an equirectangular projection.
Figure 5: Logarithmised number of mosquitos detected in each country during all study period, the map have an conic equal area projection.
In figure 4 it is easier to compare the frequencies of detected mosquitoes for different countries with each other compared to figure 3. It appears that most detection of mosquitoes are in the continents: North America, South America, Asia, and Oceania. It is however more difficult to estimate the absolute frequency for each country since the values have to be calculated back from log.
In figure 5 the projection of the map is conic equal area, this means
that the area of each country in the map is representative of the size
in the real world. However the projection presents the map in a
different shape than most people are used to.
In figure 4 the projection is not equal area nor conformal, this means
that areas and the shapes of countries in the map is not correct to the
real world. The map is however thematic, which makes it easier to
inspect compared to figure 5.
In this task Brazil has been discretised into 100x100 squares. In each square the number of observation have been counted, the mean value of longitude and latitude of the observations within each groups have been calculated. The result is presented in figure 6.
Figure 6: Detected mosquitoes in Brazil for the year 2013.
In figure 6 most of the density is along the east coast and to the south. Comparing figure 6 to figure 2 it is easier to compare smaller areas in the country, since the frequency is colour coded.
The data in this task is read and processed with the following code.
# Task 2.1
SCB_file <- read.csv("sweden_income.csv", fileEncoding = "iso-8859-1")
SWE_map <- fromJSON(file="gadm41_SWE_1.json")
colnames(SCB_file) <- c("region","age","income")
SCB_file[SCB_file$age == "18-29 years", 2] <- "Young"
SCB_file[SCB_file$age == "30-49 years", 2] <- "Adult"
SCB_file[SCB_file$age == "50-64 years", 2] <- "Senior"
SCB_file <- SCB_file %>%
pivot_wider(
names_from = age,
values_from = income
)
In figure 7 violin plots for the age groups “Young”, “Adult” and “Senior” are presented.
Figure 7: Violin plots showing mean income distributions per age group
In figure 7, it can be observed that the “Young” age group has the
lowest average income compared to the three age groups. The median
income is fairly similar between the “Adult” and “Senior” groups, but
the variance is higher in the “Senior” age group.”Young” and “Adult”
each have two outliers, whereas the “Senior” age group has just one
outlier, all of which represent mean incomes significantly higher than
the median mean income across all age groups. For the age groups “Adult”
and “Senior,” the distribution is skewed to the right, while for the age
group “Young,” the distribution more closely resembles a normal
distribution.
The mode for all age groups are close to their median.
In figure 8 the dependence of “Senior” incomes on “Adult” and “Young” income are visualised.
Figure 8: Surface plot in Plotly showing dependence of Senior incomes on Adult and Young incomes in various counties
In figure 8, you can see that there is a positive trend between “Senior” and both “Young” and “Adult”, which means that if a region has a high relative income for “Young” it also has a high relative income for “Senior”. The same conclusion is drawn between “Adult” and “Senior”. This positive trend may indicate that regions with better economic conditions tend to benefit all age groups. We think that linear regression can be a suitable to model to predict mean income of “Senior” with “Young” and “Adult” as dependent variables, due to the positive correlation among the three age groups.
Figure 9 shows the distribution of mean incomes among the “Young” age group across different counties and figure 10 shows the distribution of mean incomes among the “Adult” age group.
Figure 9: Incomes of Young in different counties
Figure 10: Incomes of Adults in different counties
In figure 9 it is clear to see that the Stockholm region has the highest mean income among the regions. It also appears that regions in northern Sweden have on average a lower mean income than southern Sweden. The region with the lowest mean income is Västerbotten.
In figure 10, Stockholm also has the highest mean income, and it is also evident that northern Sweden has lower incomes than southern Sweden. The region with the lowest mean income is Gävleborg.
The choropleth maps gives new information in such way that you can
see values for each individual county, this allows comparison between
them and also identify the outliers. In figure 7 there were two outliers
for the group “Young”, in figure 9 these two outliers can be identified
as Stockholm and Halland. For the age group “Adult”, the same two
outliers were identified in figure 10.
It is also possible to divide Sweden into different clusters
In figure 11 the location of Linköping has been added to the map from figure 10.
Figure 11: Incomes of Adults in different counties with Linköping marked with a red dot
We worked on the tasks individually before the data labs, Duc on task 1 and William on task 2. We later solved the other task and compared our solutions.
Most of the text written by Duc.
Most of the text written by William.
# Task 1.1
df_mosq %>%
filter(YEAR==2004) %>%
plot_mapbox() %>%
add_markers(type="scattermapbox",
y=~Y,
x=~X,
color=~VECTOR)
df_mosq %>%
filter(YEAR==2013) %>%
plot_mapbox() %>%
add_markers(type="scattermapbox",
y=~Y,
x=~X,
color=~VECTOR)
# Task 1.2
df_z <- as.data.frame(table(df_mosq$COUNTRY))
df_temp <- df_mosq %>% select(COUNTRY, COUNTRY_ID) %>% unique()
df_z <- merge(x=df_z, y=df_temp,
by.x="Var1",
by.y="COUNTRY")
colnames(df_z) <- c("country", "freq", "code")
# World map
g <- list(
projection = list(type = "equirectangular")
)
plot_geo(df_z) %>%
add_trace(
z = ~freq, color= ~freq, color = "Blues", text = ~country, locations = ~code
) %>%
layout(
geo = g
) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Frequency")
# Task 1.3a
g <- list(
projection = list(type = "equirectangular")
)
plot_geo(df_z) %>%
add_trace(
z = ~log(freq), color= ~freq, color = "Blues", text = ~country, locations = ~code
) %>%
layout(
geo = g
) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "log(frequency)")
# Task 1.3b
g <- list(
projection = list(type = "conic equal area")
)
plot_geo(df_z) %>%
add_trace(
z = ~log(freq), color= ~freq, color = "Blues", text = ~country, locations = ~code
) %>%
layout(
geo = g
) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "log(frequency)")
# Task 1.4
df_brazil <-
df_mosq %>%
filter(YEAR==2013 & COUNTRY == "Brazil")
df_brazil$X_int <- cut_interval(df_brazil$X, n=100)
df_brazil$Y_int <- cut_interval(df_brazil$Y, n=100)
data_plot <-
df_brazil %>%
group_by(X_int, Y_int) %>%
summarise(mean_x = mean(X),
mean_y = mean(Y),
Count = n())
data_plot %>%
plot_mapbox() %>%
add_markers(type="scattermapbox",
y=~mean_y,
x=~mean_x,
color=~Count) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Frequency")
# Task 2.1
SCB_file <- read.csv("sweden_income.csv", fileEncoding = "iso-8859-1")
SWE_map <- fromJSON(file="gadm41_SWE_1.json")
colnames(SCB_file) <- c("region","age","income")
SCB_file[SCB_file$age == "18-29 years", 2] <- "Young"
SCB_file[SCB_file$age == "30-49 years", 2] <- "Adult"
SCB_file[SCB_file$age == "50-64 years", 2] <- "Senior"
SCB_file <- SCB_file %>%
pivot_wider(
names_from = age,
values_from = income
)
# Task 2.2
plot <- plot_ly() %>%
add_trace(data = SCB_file, x = ~"Young", y = ~Young,
type = "violin", name = "Young", box=list(visible=TRUE)) %>%
add_trace(data = SCB_file, x = ~"Adult", y = ~Adult,
type = "violin", name = "Adult", box=list(visible=TRUE)) %>%
add_trace(data = SCB_file, x = ~"Senior", y = ~Senior,
type = "violin", name = "Senior", box=list(visible=TRUE)) %>%
layout(xaxis = list(title = "Age Group",
categoryorder = "array", categoryarray = c("Young", "Adult", "Senior")),
yaxis = list(title = "Mean income"))
plot
# Task 2.3
s <- interp(SCB_file$Young, SCB_file$Adult, SCB_file$Senior, duplicate = "mean")
plot2 <- plot_ly(data = SCB_file,
x=~s$x,
y=~s$y,
z=~s$z,
type = "surface",
colorbar = list(title = "Mean senior Income")) %>%
layout(scene = list(
xaxis = list(title = "Mean adult Income"),
yaxis = list(title = "Mean young Income"),
zaxis = list(title = "Mean senior Income"),
showlegend = FALSE
))
plot2
# Task 2.4
# removing county ID and "county" from string
SCB_file$region <- SCB_file$region %>% str_sub(., 4, nchar(SCB_file$region) - 7)
SCB_file[SCB_file$region=="Västra Götaland",1] <- "VästraGötaland"
SCB_file[SCB_file$region=="Örebro",1] <- "Orebro"
# Young
g <- list(fitbounds="locations", visible=FALSE)
plot3 <- plot_geo(SCB_file) %>%
add_trace(type="choropleth",geojson=SWE_map, locations=~region,
z=~Young, featureidkey="properties.NAME_1") %>% layout(geo=g) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Mean Young Income")
plot3
# Adult
plot4 <- plot_geo(SCB_file) %>%
add_trace(type="choropleth",geojson=SWE_map, locations=~region,
z=~Adult, featureidkey="properties.NAME_1") %>% layout(geo=g) %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Mean Adult Income")
plot4
# Task 2.5
plot5 <- plot_geo(SCB_file) %>%
add_trace(type="choropleth",geojson=SWE_map, locations=~region,
z=~Adult, featureidkey="properties.NAME_1") %>% layout(geo=g) %>%
add_trace(y =58.41109, x=15.62565, color =I("red"), text = "Linköping", type="markers") %>%
layout(showlegend = FALSE) %>%
colorbar(title = "Mean Adult Income")
plot5